NYC Rat Sightings vs Subway Entrances - Animated MapΒΆ

This notebook presents an animated geospatial analysis of NYC rat sightings in relation to subway entrances. Motivated by the rise in reported sightings during the Covid-19 pandemic, the project explores whether rat movement patterns align with urban transit infrastructure. The analysis combines open data sources, nearest-neighbor modeling, and interactive visualization to illustrate how public datasets can be leveraged to study behavioral and environmental dynamics in a city.

Key Skills Demonstrated:

  • Data ingestion from public APIs (NYC Open Data via Socrata)
  • Data cleaning, preprocessing, and temporal aggregation (pandas)
  • Spatial analysis using nearest-neighbor search (scikit-learn KDTree)
  • Geospatial visualization and animation (Plotly/Mapbox)
  • Reproducible, interactive reporting (Jupyter Notebook)

⚠️ Tokens/Files:

  • Socrata app token is optional but recommended.
  • Mapbox is optional β€” if you don't have a token, the notebook uses a USGS raster tile as a free basemap layer.
InΒ [Β ]:
# (Optional) Install dependencies. Uncomment if needed.
# %pip install pandas sodapy plotly numpy scikit-learn
InΒ [1]:
import os
import numpy as np
import pandas as pd
from sodapy import Socrata
import plotly.express as px
from sklearn.neighbors import KDTree

# ---- Configuration ----
SOCRATA_DOMAIN = "data.cityofnewyork.us"
SOCRATA_APP_TOKEN = os.getenv("SOCRATA_APP_TOKEN", None)  # set in env or paste below (not recommended to commit)

# Local CSV fallback for subway entrances (if you have it):
LOCAL_MTA_CSV = "NYC_Transit_Subway_Entrance_And_Exit_Data.csv"

# If you don't have the local CSV, you can use a public URL for the same dataset:
# (You can replace with the official Open Data CSV export link if preferred.)
MTA_URL = "https://data.ny.gov/resource/i9wp-a4ja.csv?$query=SELECT%20division%2C%20line%2C%20borough%2C%20stop_name%2C%20complex_id%2C%20constituent_station_name%2C%20station_id%2C%20gtfs_stop_id%2C%20daytime_routes%2C%20entrance_type%2C%20entry_allowed%2C%20exit_allowed%2C%20entrance_latitude%2C%20entrance_longitude%2C%20entrance_georeference%20ORDER%20BY%20line%20ASC"

# Optional: Mapbox token (for nicer basemap). If not provided, a USGS raster layer is used.
MAPBOX_TOKEN = os.getenv("MAPBOX_TOKEN", None)  # or set to a string
if MAPBOX_TOKEN:
    px.set_mapbox_access_token(MAPBOX_TOKEN)
InΒ [3]:
# ---- 1) Download rat sightings (descriptor = 'Rat Sighting') ----
client = Socrata(SOCRATA_DOMAIN, SOCRATA_APP_TOKEN, timeout=480)
results = client.get("erm2-nwe9",
                     where="descriptor = 'Rat Sighting'",
                     limit=200000)  # adjust limit if needed

rats = pd.DataFrame.from_records(results)
# Keep rows with lat/lon
rats = rats[rats['latitude'].notna() & rats['longitude'].notna()].copy()

# Parse created_date
rats["created_date"] = pd.to_datetime(rats["created_date"], errors="coerce")
rats = rats.dropna(subset=["created_date"])
rats = rats.sort_values("created_date")

# Build Month Year string for animation
rats["Date"] = rats["created_date"].dt.strftime("%B %Y")

# Ensure numeric lat/lon
rats["latitude"] = pd.to_numeric(rats["latitude"], errors="coerce")
rats["longitude"] = pd.to_numeric(rats["longitude"], errors="coerce")
rats = rats.dropna(subset=["latitude", "longitude"]).reset_index(drop=True)

rats.head(3)
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
Out[3]:
unique_key created_date agency agency_name complaint_type descriptor location_type incident_zip incident_address street_name ... park_borough latitude longitude location closed_date facility_type resolution_description resolution_action_updated_date due_date Date
0 15633803 2010-01-01 08:29:58 DOHMH Department of Health and Mental Hygiene Rodent Rat Sighting 3+ Family Apt. Building 11206 202 PULASKI STREET PULASKI STREET ... BROOKLYN 40.692989 -73.943771 {'latitude': '40.69298896011082', 'longitude':... NaN N/A Please contact the Department of Health and Me... 2010-01-01T08:37:34.000 2010-01-31T08:29:58.000 January 2010
1 15633054 2010-01-01 11:20:45 DOHMH Department of Health and Mental Hygiene Rodent Rat Sighting 1-2 Family Dwelling 11365 59-13 159 STREET 159 STREET ... QUEENS 40.739983 -73.809299 {'latitude': '40.73998332248969', 'longitude':... NaN N/A Please contact the Department of Health and Me... 2010-01-01T11:30:23.000 2010-01-31T11:20:45.000 January 2010
2 15633896 2010-01-01 12:11:51 DOHMH Department of Health and Mental Hygiene Rodent Rat Sighting 3+ Family Apt. Building 10027 317 WEST 120 STREET WEST 120 STREET ... MANHATTAN 40.807367 -73.954388 {'latitude': '40.807367287308594', 'longitude'... NaN N/A Please contact the Department of Health and Me... 2010-01-01T12:18:09.000 2010-01-31T12:11:51.000 January 2010

3 rows Γ— 35 columns

InΒ [5]:
# ---- 2) Load subway entrances ----
# Try local CSV first, otherwise fetch via URL
if os.path.exists(LOCAL_MTA_CSV):
    mta = pd.read_csv(LOCAL_MTA_CSV)
else:
    try:
        mta = pd.read_csv(MTA_URL)
    except Exception as e:
        raise RuntimeError("Could not load MTA entrances. Provide LOCAL_MTA_CSV or check MTA_URL.") from e

# Standardize column names to locate lat/lon columns
cols = {c.lower(): c for c in mta.columns}
lat_col = None
lon_col = None

# Common column name possibilities
lat_col = cols["entrance_latitude"]
lon_col = cols["entrance_longitude"]

if lat_col is None or lon_col is None:
    raise ValueError("Could not find latitude/longitude columns in the MTA dataset.")

mta = mta.dropna(subset=[lat_col, lon_col]).copy()
mta_latlon = mta[[lat_col, lon_col]].astype(float).values

mta.head(3)
Out[5]:
division line borough stop_name complex_id constituent_station_name station_id gtfs_stop_id daytime_routes entrance_type entry_allowed exit_allowed entrance_latitude entrance_longitude entrance_georeference
0 BMT 4th Av B Atlantic Av-Barclays Ctr 617 Atlantic Av-Barclays Ctr 27 R31 2 3 4 5 B D N Q R Stair YES YES 40.683905 -73.978879 POINT (-73.978879 40.683905)
1 BMT 4th Av B Atlantic Av-Barclays Ctr 617 Atlantic Av-Barclays Ctr 27 R31 2 3 4 5 B D N Q R Elevator YES YES 40.683805 -73.978487 POINT (-73.978487 40.683805)
2 BMT 4th Av B Atlantic Av-Barclays Ctr 617 Atlantic Av-Barclays Ctr 27 R31 2 3 4 5 B D N Q R Stair YES YES 40.683928 -73.978412 POINT (-73.978412 40.683928)
InΒ [7]:
# ---- 3) Nearest subway entrance via KDTree ----
tree = KDTree(mta_latlon, metric="euclidean")

# Query nearest neighbor for each rat sighting
rat_latlon = rats[["latitude", "longitude"]].values
dist, idx = tree.query(rat_latlon, k=1)

# Convert approx degrees to meters (NYC scale, rough): 1 degree ~ 111,111 meters
rats["Distance to Subway Entrance"] = (dist.flatten() * 111_111).astype(float)

# Log-transform for color scaling (avoid log(0))
rats["log10_dist"] = np.log10(np.maximum(rats["Distance to Subway Entrance"], 1e-3))
rats[["latitude","longitude","Distance to Subway Entrance","log10_dist"]].head(3)
Out[7]:
latitude longitude Distance to Subway Entrance log10_dist
0 40.692989 -73.943771 1482.038542 3.170859
1 40.739983 -73.809299 9969.632839 3.998679
2 40.807367 -73.954388 328.194495 2.516131
InΒ [9]:
# ---- 4) Animated map ----
fig = px.scatter_mapbox(
    rats,
    lat="latitude",
    lon="longitude",
    animation_frame="Date",
    animation_group="unique_key" if "unique_key" in rats.columns else None,
    color="log10_dist",
    hover_data={"Date": True, "latitude": False, "longitude": False, "Distance to Subway Entrance": True},
    color_continuous_scale="armyrose",
    range_color=[0, 5],
    title="NYC Rat Sightings vs Distance to Nearest Subway Entrance"
)

# Animation timing tweaks (if frames exist)
if fig.layout.updatemenus and fig.layout.updatemenus[0].buttons:
    try:
        fig.layout.updatemenus[0].buttons[0].args[1]['frame']['duration'] = 300
        fig.layout.updatemenus[0].buttons[0].args[1]['transition']['duration'] = 20
    except Exception:
        pass

# Colorbar labels
fig.update_layout(coloraxis_colorbar=dict(
    title="Distance (approx.)",
    tickvals=[4, 3, 2, 1, 0],
    ticktext=["~10km", "~1km", "~100m", "~10m", "~1m"]
))

# Basemap: use Mapbox if token provided, else USGS raster tiles
if os.getenv("MAPBOX_TOKEN"):
    fig.update_layout(mapbox_style="streets")
else:
    fig.update_layout(
        mapbox_style="white-bg",
        mapbox_layers=[
            {
                "below": "traces",
                "sourcetype": "raster",
                "sourceattribution": "United States Geological Survey",
                "source": [
                    "https://basemap.nationalmap.gov/arcgis/rest/services/USGSImageryOnly/MapServer/tile/{z}/{y}/{x}"
                ],
            }
        ],
    )

fig.update_layout(height=700,width=1000,
                  margin={"r":0,"t":40,"l":0,"b":0},
                  mapbox=dict(
                    center={"lat": 40.741895, "lon": -73.989308},zoom=10))
fig.show()

Notes & EthicsΒΆ

  • Distances are approximate (Euclidean on lat/lon) and intended for visualization only.
  • Open data can contain errors; consider de-duplication and QC steps in production.
  • When mapping incident data, be mindful of privacy and context.